Recipient-independent, high-accuracy FMT-response prediction and optimization in mice and humans

Background Some microbiota compositions are associated with negative outcomes, including among others, obesity, allergies, and the failure to respond to treatment. Microbiota manipulation or supplementation can restore a community associated with a healthy condition. Such interventions are typically probiotics or fecal microbiota transplantation (FMT). FMT donor selection is currently based on donor phenotype, rather than the anticipated microbiota composition in the recipient and associated health benefits. However, the donor and post-transplant recipient conditions differ drastically. We here propose an algorithm to identify ideal donors and predict the expected outcome of FMT based on donor microbiome alone. We also demonstrate how to optimize FMT for different required outcomes. Results We show, using multiple microbiome properties, that donor and post-transplant recipient microbiota differ widely and propose a tool to predict the recipient post-transplant condition (engraftment success and clinical outcome), using only the donors’ microbiome and, when available, demographics for transplantations from humans to either mice or other humans (with or without antibiotic pre-treatment). We validated the predictor using a de novo FMT experiment highlighting the possibility of choosing transplants that optimize an array of required goals. We then extend the method to characterize a best-planned transplant (bacterial cocktail) by combining the predictor and a generative genetic algorithm (GA). We further show that a limited number of taxa is enough for an FMT to produce a desired microbiome or phenotype. Conclusions Off-the-shelf FMT requires recipient-independent optimized FMT selection. Such a transplant can be from an optimal donor or from a cultured set of microbes. We have here shown the feasibility of both types of manipulations in mouse and human recipients. Video Abstract Supplementary Information The online version contains supplementary material available at 10.1186/s40168-023-01623-w.


Datasets: FMT from humans to humans
We collected microbiome data (16S rRNA sequences) of both human donors and human recipients from 8 human-to-human FMT experiments, referred to as human-to-human cohorts and colored in light orange in the figures. The datasets used are described in Table 2. For the clinical outcome predictions (success is response vs failure of FMT treatment at different time points after the FMT). We used 5 different cohorts: (a) inflammatory bowel disease (IBD) [4], as measured by Mayo score 3 months after FMT [5], (b) irritable bowel syndrome (IBS) measured by a decline in symptoms, (c)the response to PD-1 therapy in patients with melanoma [6], (d) ulcerative colitis (UC) measured by simple clinical colitis activity index scores (where ≥ 2 is ill) [7], and (e) antibiotics resistance (AR) measured by a decline in symptoms [7]. For more details see Table 3.

Microbiome characterization (human-to-GF and human-to-ABX mice)
DNA was extracted from all human and mouse fecal samples, using the MagMAX Microbiome Ultra-Kit (Thermo Fisher, Waltham, MA) according to the manufacturer's instructions and following a 2-minute bead beating step (BioSpec, Bartlesville, USA). The V4 region of the bacterial 16S rRNA gene was amplified by polymerase chain reaction (PCR) using the 515F (AAT-GATACGGCGACCACCGAGATCTACACGCT) barcoded and 806R (TATGGTAATTGTGT-GYCAGCMGCCGCGGTAA) primers [8] with a final concentration of 0.04% of each primer 1 and 0.5% of PrimeSTAR Max DNA Polymerase (Takara-Clontech, Shiga, Japan) in 50µl total volume. PCR reactions were carried out by 30-35 cycles of denaturation (95°C), annealing (55°C) and extension (72°C), with final elongation at 72°C. PCR products were purified using XP magnetic beads (Beckman Coulter, Indianapolis, IN) and quantified using the Picogreen dsDNA quantitation kit (Invitrogen, Carlsbad, CA). Samples were then pooled in equal amounts, loaded on 2% agarose E-Gel (Thermo Fisher, Waltham, MA), purified and sent for sequencing using the Illumina MiSeq platform (Genomic Center, Azrieli Faculty of Medicine, Bar-Ilan University, Israel).

Dataset download and processing
Most datasets studied here are publicly available in the NCBI website. Those were downloaded using a newly developed package YaMaS github.com/YarinBekor/YaMAS. The Yamas package is a comprehensive tool that combines various packages and tools, such as the SRA-toolkit https: //trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software, QIIME2 [9], and DADA2 plugin [10]. Its primary function is to enable easy downloading of DNA (Deoxyribonucleic acid) datasets from the NCBI (National Center for Biotechnology Information) and ENA (European Nucleotide Archive). The package is designed to be user-friendly, efficient and simple, even for non-programmers. The package follows a straightforward process. First, it downloads the raw DNA dataset. It then selects the appropriate pipeline for data processing. To transform short reads into fastq files, the package utilizes the SRA toolkit. To process 16S data type, it uses DADA2 from the QIIME package for denoising and utilizes QIIME for clustering the data and creating taxonomy, OTU, and phylogeny tree files. Conversely, for shotgun data, the package relies on the metaphlan package [11] to cluster the data and generate all relevant files.

Data merging, preprocessing, and normalization
When combining datasets, we padded with zeros the ASVs that were missing in some of the datasets. We applied the MIPMLP preprocessing [12]. For the models' inputs, we merged the donors' ASVs to the species taxonomy, by the mean method and a log normalization was followed. Notice that we analyzed the human-to-GF and human-to-human datasets separately.

Model parameters: Identifying ideal FMT donors and predicting FMT outcomes
The first model aimed to predict the donor maximizing the post-FMT Shannon's diversity index. In addition, models were built to predict FMT outcome based on donor data alone, focusing on predictions of relative abundances of all orders, the relative abundance of the 50 or 100 most prevalent species in the recipient mouse or human samples, respectively, following FMT, and presence/absence of the 50 or 100 most prevalent species in the recipient mouse or human sample respectively following FMT. Specific considerations and methods are described below.
Shannon's diversity As a measure of FMT donor quality, we used the Shannon alpha diversity (Shannon) of the mice samples (recipients) t days after the transplant, s i,t . Shannon's diversity index quantifies the uncertainty in predicting the species identity of an individual that is taken at random from the dataset [13]. Mathematically, it is defined by Shannon diversity = − s i=1 p i · ln(p i ), where p is the proportion (n/N ) of individuals of one particular species found (n) divided by the total number of individuals found (N ) and s is the number of species. High alpha diversity is related to health, while low alpha diversity is related to different diseases.

Order-level relative abundances
We predicted the relative abundances of the 10 (mouse) or 30 (human) different orders present in the recipients after the transplant, s i , based on donor sample profiles alone (a model per order). The orders' relative abundances were defined by merging the data at the order level in the MIPMLP pipeline via the sum merge function. A relative normalization was applied to each vector (see above).

Species-level relative abundances
Similarly, we predicted the relative abundances of the 50 most prevalent species of the recipients after the transplant in mice, s i , and of the 100 most prevalent species in human samples (a model per specie). A relative normalization was applied to each vector (see above).

Species-level prevalence
Finally, we predicted the binary presence or absence of the 50 most prevalent species from the mouse samples and of the 100 most prevalent species from the human samples after the transplant, s i (a model per specie).

FMT success
We also used our model to assess FMT outcomes in human-to-human FMT 5 clinical cohorts such as IBD measured by the improvement in Mayo score, response to PD-1 treatment in melanoma patients, etc.

Predictive models
When predicting outcomes, the time of the outcome was one of the input features of the model. For the training of the models, all the time steps of each sample i, t i in the training set were used.
We also added metadata (donor's age, sex, and weight) when available. The missing metadata was completed by the median for the continuous variables, and by another category for the binary ones.

Linear predictors
We predicted the recipient outcomes at each time point after the FMT, s i,ti using some models implemented in sklearn, such as: a KNN (K nearest neighbors) regressor, SVR (Support Vector Machine regression) and Ridge regression [14,15,16]. The inputs of the models were the preprocessed ASVs from MIPMLP, a i and a scalar of the number of days after the FMT t i . We used a grid search over human to GF datasets and used the same hyperparameters for all the order and specie tasks. The same hyperparameters were used for all the tasks of the human-to-human datasets. No hyperparameter tuning was applied for the human-to-human tasks. For the best hyperparameters, see Supp. Mat. Table S1 and for the search space see Table S4.

Decision tree models
The inputs of the models were the preprocessed ASVs from MIPMLP, a i and a scalar of the number of days after the FMT t i . We used grid search in order to optimize the hyperparameters only on the Shannon task on the human to GF datasets and used the same hyperparameters for all the order and species tasks. The same hyperparameters were used for all the tasks of the human-to-human datasets. No hyperparameter tuning was applied in the human-to-human tasks. For the best hyperparameters, see Supp. Mat. Table S2 and for the search space see  Table S5. We used the XGBOOST regressor of xgboost [17] as well as the Random Forest (RF) regressor and classifier of sklearn [18].

Fully connected neural networks
We used two fully connected neural networks on the processed ASVs from the MIPMLP a i and a scalar of the number of days after the FMT t i . We used dropout L1 and L2 regularization to the network to avoid overfitting. We ran Neural Network Intelligence (NNI) [19] to find the best activation function and the number of neurons in each layer only on the Shannon task over the human-to-GF cohorts and used the same hyperparameters for all the order and specie tasks. The same hyperparameters were used for all the tasks of the human-to-human datasets. No hyperparameter tuning was applied for the human-to-human tasks. For the best hyperparameters, see Supp. Mat. Table S2 and for the search space see Table S5.
iMic -Image Microbiome iMic translates the microbiome ASVs vectors to images, by using the structure of the taxonomy tree and the relations between the taxa in each group [20]. Once the donors' microbiome input was translated to an image, we used CNNs followed by two fully connected layers for the prediction itself. We fine-tuned the model's hyperparameters using an NNI framework on 10 cross-validations only on the Shannon task over the human-to-GF cohorts. We then used the same hyperparameters for all the order and specie tasks. The same hyperparameters were used for all the tasks of the human-to-human datasets. No hyperparameter tuning was applied for the human-to-human tasks. The best hyperparameters used are in Supp. Mat. Table S3 and the search space can be found in Table S6. We further used iMic with the same hyperparameters with additional metadata of the donors (concatenated to the flattened CNNs' output before the fully connected layers), including the donors' age, sex and weight at the transplant. We used the same metadata to predict all the time points after the FMT. We filled the missing data of the continuous variables by the variable median of the cohort. There was no missing data in the categorical variables. Notice that for the success-failure predictions, we used the same hyperparameters of all the human-to-human tasks. No further hyperparameter tuning was applied.

Training test split
The data was divided into three groups. The training set consisted of 60% of the data, a validation set that contained 20% of the data for hyperparameter tuning (in the human to GF cohort only) and early stopping and an external test set of another 20% of the data, for the task evaluation (on which all the results are reported here). During the split, we ensured all the samples of the same mouse were in the same training, validation, or test set to prevent data leakage.
For all learning tasks, we used 10 cross-validations for the stability of the results. The results were reported as an average of 10 runs on the external test set.

Hyperparameter tuning
We computed the best hyperparameters for each model (KNN, SVR, LR, RF, XGBOOST, NN and iMic models) over the human-to-GF cohorts on the Shannon diversity task using a 10-fold cross-validation [21] on the internal validation only for the Shannon prediction. The same hyperparameters were used for all the other taxa frequency prediction tasks. We chose the hyperparameters according to the average R2 on the 10 validations. The platform we used for the optimization of the hyperparameters was NNI for the more complex models (NN, XGBOOST, RF, and iMic models). For the simple models (KNN, SVR and Ridge), a grid search was applied. The tuned hyperparameters were: the coefficient of the L1 loss, the weight decay (L2regularization), the activation function (RelU,elU or tanh, which makes the model non-linear), the number of neurons in the fully connected layers, dropout (a regularization method which zeros the neurons in the layers in the dropout's probability), batch size and learning rate. For the CNN models, we also had the kernel sizes as well as the strides and the padding as hyperparameters. We chose the best activation function from RelU, ElU and tanh. The number of neurons was chosen relative to the input dimension. The first linear division factor from the input size was chosen randomly from [1,11]. The second layer division factor was chosen from [1,6]. The kernel sizes were defined by two different hyperparameters, a parameter for its length and its width. The length was in the range of [1,8] and the width was in the range of [1,20]. The strides were in the range of [1,9] and the channels were in the range of [1,16]. For the search space tables see Supp. Mat. Tables S4-S6. No hyperparameter tuning was done for the human-to-human tasks. We used the appropriate hyperparameters from the human-to-GF tasks.

Identifying most and least optimal donors for the FMT validation experiment
Three groups were defined from the predicted properties (plotted as two groups for the sake of simplicity in Fig. 3 A). The first group is samples predicted to have a high Shannon after the FMT and the second group is predicted to have a low one (representing the top and bottom 20 % of the expected values. Given the association between the donor age and the transplant outcome, and that the entire second group was comprised of child donor samples, we also identified a third group, age-matched to the first, but expected to result in low diversity in donors (though not as low as the second group).

FMT experiment
After mice were pretreated with antibiotics and randomized into groups, FMT was administered via oral gavage as previously described [22]. Briefly, several grams of human donor fecal matter were resuspended in sterile PBS. 200µl of the solution was then delivered directly to the mouse's stomach using a flexible feeding tube. Two mice received FMT from each donor and then were housed together. The FMT was repeated for a second and final time 1 week after the first FMT to improve engraftment. Fecal samples were collected for microbiota profiling, and mice weights were recorded before each FMT and once per week for the six weeks following the second FMT, after which the mice were euthanized. Fecal microbiota changes were profiled through the experiment following FMT, and data were used to test model predictions of (1) Shannon diversity and also of another focal parameter -relative abundances at the (2) order level.

Genetic Algorithm (GA)
100 simulated "parent" donor populations were randomly chosen from all the simulated donor populations of all the cohorts (human-to-GF), a i (Fig. 6 A. The following steps were then applied: step A). A binary representation, b i , of the MIPMLP preprocessed donor vectors, a i , was created for each donor (Fig.6 A step B). Each MIPMLP preprocessed donor vector, a i , got into the pretrained iMic model with a scalar of 7 days after FMT. iMic returned the predicted outcome of each recipient (Fig. 6 A, step C). All the predicted outcomes, s i , were used for the fitness function for selection. We used the following fitness function: such that sum(b i ) represents the number of non-zero taxa in the donor sample, and γ is a hyperparameter that controls the importance of solving the problem with minimum non-zero taxa. For the minimization tasks, we used the following fitness function: such that textsum(b i ) represents the number of non-zero taxa in the donor sample, and γ is a hyperparameter that controls the importance of solving the problem with maximum non-zero taxa. Note that we do not explicitly require a given number of taxa in the optimized microbe combination. Instead, we penalize that by assuming a cost to each different microbe. As such, a new microbe is only added if it significantly improves the required outcome. Also, the limitation is distinct for each candidate donor. Each taxon can be separately present in each candidate combination.
To complete the donors' parents of the next generation a mutation (see below) occurs with a probability of 0.3, and a recombination (see below) occurs with a probability of 0.3 (Fig. 6  A, step E). Then the stopping rule is checked. If the stopping rule is met, the donors of step E are returned; otherwise, the new generation of donors from step E is the initial values in step C, till the stopping rule is met. The GA was applied for 25 generations since it converged within this number of generations. In mutations, a zero entry of the binary vector b i from the 30 new optimized donors is replaced by one vector with a probability of 0.3. Then the entry of the MIMPLP preprocessed ASVs vector, a ′ i is changed, respectively. In recombinations, half of a sample a i and half of a sample a j from the 30 new optimized donors are concatenated into a new recombined sample, a * k . 6 2 FMT studies analysis 2.1 Microbiome properties distribution analysis      : Recipient's orders can be predicted from donor's samples only (human-to-human datasets). Different models evaluation scores of recipient's relative abundances SCC. The colors and symbols are as above (orange for the non-iMic models and pink for the iMic model).

Models hyperparameters
All models' hyperparameters were optimized on a validation set of 20% of the data. The search was applied by a grid-search to the simple models (KNN, SVR and Ridge) and by an NNI to the more complicated models (NN, XGBOOST,iMic1 and iMic2).      Kernel size 2 second convolution layer (columns) - [1,10]